88 research outputs found

    Clustering with shallow trees

    Full text link
    We propose a new method for hierarchical clustering based on the optimisation of a cost function over trees of limited depth, and we derive a message--passing method that allows to solve it efficiently. The method and algorithm can be interpreted as a natural interpolation between two well-known approaches, namely single linkage and the recently presented Affinity Propagation. We analyze with this general scheme three biological/medical structured datasets (human population based on genetic information, proteins based on sequences and verbal autopsies) and show that the interpolation technique provides new insight.Comment: 11 pages, 7 figure

    Finding undetected protein associations in cell signaling by belief propagation

    Full text link
    External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field.Comment: 6 pages, 3 figures, 1 table, Supporting Informatio

    Beyond inverse Ising model: structure of the analytical solution for a class of inverse problems

    Full text link
    I consider the problem of deriving couplings of a statistical model from measured correlations, a task which generalizes the well-known inverse Ising problem. After reminding that such problem can be mapped on the one of expressing the entropy of a system as a function of its corresponding observables, I show the conditions under which this can be done without resorting to iterative algorithms. I find that inverse problems are local (the inverse Fisher information is sparse) whenever the corresponding models have a factorized form, and the entropy can be split in a sum of small cluster contributions. I illustrate these ideas through two examples (the Ising model on a tree and the one-dimensional periodic chain with arbitrary order interaction) and support the results with numerical simulations. The extension of these methods to more general scenarios is finally discussed.Comment: 15 pages, 6 figure

    Reverse Engineering Gene Networks with ANN: Variability in Network Inference Algorithms

    Get PDF
    Motivation :Reconstructing the topology of a gene regulatory network is one of the key tasks in systems biology. Despite of the wide variety of proposed methods, very little work has been dedicated to the assessment of their stability properties. Here we present a methodical comparison of the performance of a novel method (RegnANN) for gene network inference based on multilayer perceptrons with three reference algorithms (ARACNE, CLR, KELLER), focussing our analysis on the prediction variability induced by both the network intrinsic structure and the available data. Results: The extensive evaluation on both synthetic data and a selection of gene modules of "Escherichia coli" indicates that all the algorithms suffer of instability and variability issues with regards to the reconstruction of the topology of the network. This instability makes objectively very hard the task of establishing which method performs best. Nevertheless, RegnANN shows MCC scores that compare very favorably with all the other inference methods tested. Availability: The software for the RegnANN inference algorithm is distributed under GPL3 and it is available at the corresponding author home page (http://mpba.fbk.eu/grimaldi/regnann-supmat

    Synonymous codon usage influences the local protein structure observed

    Get PDF
    Translation of mRNA into protein is a unidirectional information flow process. Analysing the input (mRNA) and output (protein) of translation, we find that local protein structure information is encoded in the mRNA nucleotide sequence. The Coding Sequence and Structure (CSandS) database developed in this work provides a detailed mapping between over 4000 solved protein structures and their mRNA. CSandS facilitates a comprehensive analysis of codon usage over many organisms. In assigning translation speed, we find that relative codon usage is less informative than tRNA concentration. For all speed measures, no evidence was found that domain boundaries are enriched with slow codons. In fact, genes seemingly avoid slow codons around structurally defined domain boundaries. Translation speed, however, does decrease at the transition into secondary structure. Codons are identified that have structural preferences significantly different from the amino acid they encode. However, each organism has its own set of β€˜significant codons’. Our results support the premise that codons encode more information than merely amino acids and give insight into the role of translation in protein folding

    Increasing the Depth of Current Understanding: Sensitivity Testing of Deep-Sea Larval Dispersal Models for Ecologists

    Get PDF
    Larval dispersal is an important ecological process of great interest to conservation and the establishment of marine protected areas. Increasing numbers of studies are turning to biophysical models to simulate dispersal patterns, including in the deep-sea, but for many ecologists unassisted by a physical oceanographer, a model can present as a black box. Sensitivity testing offers a means to test the models' abilities and limitations and is a starting point for all modelling efforts. The aim of this study is to illustrate a sensitivity testing process for the unassisted ecologist, through a deep-sea case study example, and demonstrate how sensitivity testing can be used to determine optimal model settings, assess model adequacy, and inform ecological interpretation of model outputs. Five input parameters are tested (timestep of particle simulator (TS), horizontal (HS) and vertical separation (VS) of release points, release frequency (RF), and temporal range (TR) of simulations) using a commonly employed pairing of models. The procedures used are relevant to all marine larval dispersal models. It is shown how the results of these tests can inform the future set up and interpretation of ecological studies in this area. For example, an optimal arrangement of release locations spanning a release area could be deduced; the increased depth range spanned in deep-sea studies may necessitate the stratification of dispersal simulations with different numbers of release locations at different depths; no fewer than 52 releases per year should be used unless biologically informed; three years of simulations chosen based on climatic extremes may provide results with 90% similarity to five years of simulation; and this model setup is not appropriate for simulating rare dispersal events. A step-by-step process, summarising advice on the sensitivity testing procedure, is provided to inform all future unassisted ecologists looking to run a larval dispersal simulation

    The Cost of Virulence: Retarded Growth of Salmonella Typhimurium Cells Expressing Type III Secretion System 1

    Get PDF
    Virulence factors generally enhance a pathogen's fitness and thereby foster transmission. However, most studies of pathogen fitness have been performed by averaging the phenotypes over large populations. Here, we have analyzed the fitness costs of virulence factor expression by Salmonella enterica subspecies I serovar Typhimurium in simple culture experiments. The type III secretion system ttss-1, a cardinal virulence factor for eliciting Salmonella diarrhea, is expressed by just a fraction of the S. Typhimurium population, yielding a mixture of cells that either express ttss-1 (TTSS-1+ phenotype) or not (TTSS-1βˆ’ phenotype). Here, we studied in vitro the TTSS-1+ phenotype at the single cell level using fluorescent protein reporters. The regulator hilA controlled the fraction of TTSS-1+ individuals and their ttss-1 expression level. Strikingly, cells of the TTSS-1+ phenotype grew slower than cells of the TTSS-1βˆ’ phenotype. The growth retardation was at least partially attributable to the expression of TTSS-1 effector and/or translocon proteins. In spite of this growth penalty, the TTSS-1+ subpopulation increased from <10% to approx. 60% during the late logarithmic growth phase of an LB batch culture. This was attributable to an increasing initiation rate of ttss-1 expression, in response to environmental cues accumulating during this growth phase, as shown by experimental data and mathematical modeling. Finally, hilA and hilD mutants, which form only fast-growing TTSS-1βˆ’ cells, outcompeted wild type S. Typhimurium in mixed cultures. Our data demonstrated that virulence factor expression imposes a growth penalty in a non-host environment. This raises important questions about compensating mechanisms during host infection which ensure successful propagation of the genotype

    Genomic, Proteomic and Physiological Characterization of a T5-like Bacteriophage for Control of Shiga Toxin-Producing Escherichia coli O157:H7

    Get PDF
    Despite multiple control measures, Escherichia coli O157:H7 (STEC O157:H7) continues to be responsible for many food borne outbreaks in North America and elsewhere. Bacteriophage therapy may prove useful for controlling this pathogen in the host, their environment and food. Bacteriophage vB_EcoS_AKFV33 (AKFV33), a T5-like phage of Siphoviridae lysed common phage types of STEC O157:H7 and not non-O157 E. coli. Moreover, STEC O157:H7 isolated from the same feedlot pen from which the phage was obtained, were highly susceptible to AKFV33. Adsorption rate constant and burst size were estimated to be 9.31Γ—10βˆ’9 ml/min and 350 PFU/infected cell, respectively. The genome of AKVF33 was 108,853 bp (38.95% G+C), containing 160 open reading frames (ORFs), 22 tRNA genes and 32 strong promoters recognized by host RNA polymerase. Of 12 ORFs without homologues to T5-like phages, 7 predicted novel proteins while others exhibited low identity (<60%) to proteins in the National Centre for Biotechnology Information database. AKVF33 also lacked the L-shaped tail fiber protein typical of T5, but was predicted to have tail fibers comprised of 2 novel proteins with low identity (37–41%) to tail fibers of E. coli phage phiEco32 of Podoviridae, a putative side tail fiber protein of a prophage from E. coli IAI39 and a conserved domain protein of E. coli MS196-1. The receptor-binding tail protein (pb5) shared an overall identify of 29–72% to that of other T5-like phages, with no region coding for more than 6 amino acids in common. Proteomic analysis identified 4 structural proteins corresponding to the capsid, major tail, tail fiber and pore-forming tail tip (pb2). The genome of AKFV33 lacked regions coding for known virulence factors, integration-related proteins or antibiotic resistance determinants. Phage AKFV33 is a unique, highly lytic STEC O157:H7-specific T5-like phage that may have considerable potential as a pre- and post-harvest biocontrol agent

    Shape similarity, better than semantic membership, accounts for the structure of visual object representations in a population of monkey inferotemporal neurons

    Get PDF
    The anterior inferotemporal cortex (IT) is the highest stage along the hierarchy of visual areas that, in primates, processes visual objects. Although several lines of evidence suggest that IT primarily represents visual shape information, some recent studies have argued that neuronal ensembles in IT code the semantic membership of visual objects (i.e., represent conceptual classes such as animate and inanimate objects). In this study, we investigated to what extent semantic, rather than purely visual information, is represented in IT by performing a multivariate analysis of IT responses to a set of visual objects. By relying on a variety of machine-learning approaches (including a cutting-edge clustering algorithm that has been recently developed in the domain of statistical physics), we found that, in most instances, IT representation of visual objects is accounted for by their similarity at the level of shape or, more surprisingly, low-level visual properties. Only in a few cases we observed IT representations of semantic classes that were not explainable by the visual similarity of their members. Overall, these findings reassert the primary function of IT as a conveyor of explicit visual shape information, and reveal that low-level visual properties are represented in IT to a greater extent than previously appreciated. In addition, our work demonstrates how combining a variety of state-of-the-art multivariate approaches, and carefully estimating the contribution of shape similarity to the representation of object categories, can substantially advance our understanding of neuronal coding of visual objects in cortex
    • …
    corecore